Faster Scannerless GLR Parsing
نویسندگان
چکیده
Analysis and renovation of large software portfolios requires syntax analysis of multiple, usually embedded, languages and this is beyond the capabilities of many standard parsing techniques. The traditional separation between lexer and parser falls short due to the limitations of tokenization based on regular expressions when handling multiple lexical grammars. In such cases scannerless parsing provides a viable solution. It uses the power of context-free grammars to be able to deal with a wide variety of issues in parsing lexical syntax. However, it comes at the price of less efficiency. The structure of tokens is obtained using a more powerful but more time and memory intensive parsing algorithm. Scannerless grammars are also more non-deterministic than their tokenized counterparts, increasing the burden on the parsing algorithm even further. In this paper we investigate the application of the Right-Nulled Generalized LR parsing algorithm (RNGLR) to scannerless parsing. We adapt the Scannerless Generalized LR parsing and filtering algorithm (SGLR) to implement the optimizations of RNGLR. We present an updated parsing and filtering algorithm, called SRNGLR, and analyze its performance in comparison to SGLR on ambiguous grammars for the programming languages C, Java, Python, SASL, and C++. Measurements show that SRNGLR is on average 33% faster than SGLR, but is 95% faster on the highly ambiguous SASL grammar. For the mainstream languages C, C++, Java and Python the average speedup is 16%.
منابع مشابه
Faster, Practical GLL Parsing
Generalized LL (GLL) parsing is an extension of recursivedescent (RD) parsing that supports all context-free grammars in cubic time and space. GLL parsers have the direct relationship with the grammar that RD parsers have, and therefore, compared to GLR, are easier to understand, debug, and extend. This makes GLL parsing attractive for parsing programming languages. In this paper we propose a m...
متن کاملUniversity of Amsterdam Programming Research Group Scannerless Generalized-LR Parsing
for useful suggestions and comments on previous versions of this paper. Incremental parser generation and context-sensitive disambiguation: a multidisciplinary perspective.
متن کاملDisambiguation Filters for Scannerless Generalized LR Parsers
Several real-world problems call for more parsing power than is offered by the widely used and well-established deterministic parsing techniques. These techniques also create an artificial divide between lexical and context-free analysis phases, at the cost of significant complexity at their interface. In this paper we present the fusion of generalized LR parsing and scannerless parsing. This c...
متن کاملDeclarative, Formal, and Extensible Syntax Definition for AspectJ A Case for Scannerless Generalized-LR Parsing
Aspect-Oriented Programming (AOP) is attracting attention from both research and industry, as illustrated by the ever-growing popularity of AspectJ, the de facto standard AOP extension of Java. From a compiler construction perspective, AspectJ is interesting as it is a typical example of a compositional language, i.e. a language composed of a number of separate languages with different syntacti...
متن کاملDynGenPar - A Dynamic Generalized Parser for Common Mathematical Language
This paper introduces a dynamic generalized parser aimed primarily at natural mathematical language. Our algorithm combines the efficiency of GLR parsing, the dynamic extensibility of tableless approaches and the expressiveness of extended context-free grammars such as parallel multiple context-free grammars (PMCFGs). In particular, it supports efficient dynamic rule additions to the grammar at...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009